The recent rapid and tremendous success of deep convolutional neural networks(CNN) on many challenging computer vision tasks largely derives from theaccessibility of the well-annotated ImageNet and PASCAL VOC datasets.Nevertheless, unsupervised image categorization (i.e., without the ground-truthlabeling) is much less investigated, yet critically important and difficultwhen annotations are extremely hard to obtain in the conventional way of"Google Search" and crowd sourcing. We address this problem by presenting alooped deep pseudo-task optimization (LDPO) framework for joint mining of deepCNN features and image labels. Our method is conceptually simple and rests uponthe hypothesized "convergence" of better labels leading to better trained CNNmodels which in turn feed more discriminative image representations tofacilitate more meaningful clusters/labels. Our proposed method is validated intackling two important applications: 1) Large-scale medical image annotationhas always been a prohibitively expensive and easily-biased task even forwell-trained radiologists. Significantly better image categorization resultsare achieved via our proposed approach compared to the previousstate-of-the-art method. 2) Unsupervised scene recognition on representativeand publicly available datasets with our proposed technique is examined. TheLDPO achieves excellent quantitative scene classification results. On the MITindoor scene dataset, it attains a clustering accuracy of 75.3%, compared tothe state-of-the-art supervised classification accuracy of 81.0% (when both arebased on the VGG-VD model).
展开▼